Approximate Regular Expression Searching with Arbitrary Integer Weights

نویسنده

  • Gonzalo Navarro
چکیده

We present a bit-parallel technique to search a text of length n for a regular expression of m symbols permitting k differences in worst case time O(mn/ logk s), where s is the amount of main memory that can be allocated. The algorithm permits arbitrary integer weights and matches the complexity of the best previous techniques, but it is simpler and faster in practice. In our way, we define a new recurrence for approximate searching where the current values depend only on previous values. Interestingly, our algorithm turns out to be a relevant option also for simple approximate string matching with arbitrary integer weights. ACM CCS

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

An approximate model for slug flow heat transfer in channels of arbitrary cross section

In this paper, a novel approximate solution to determine the Nusselt number for thermally developed, slug (low-prandtl), laminar, single phase flow in channels of arbitrary cross section is presented. Using the Saint-Venant principle in torsion of beams, it is shown that the thermally developed Nusselt number for low-prandtl flow is only a function of the geometrical parameters of the channel c...

متن کامل

Approximate Regular Expression Matching

We extend the de nition of Hamming and Levenshtein distance between two strings used in approximate string matching so that these two distances can be used also in approximate regular expression matching. Next, the methods of construction of nondeterministic nite automata for approximate regular expression matching considering both mentioned distances are presented.

متن کامل

Large Text Searching Allowing Errors

We present a full inverted index for exact and approximate string matching in large texts. The index is composed of a table containing the vocabulary of words of the text and a list of positions in the text corresponding to each word. The size of the table of words is usually much less than 1% of the text size and hence can be kept in main memory, where most query processing takes place. The te...

متن کامل

Expressing Context-Free Tree Languages by Regular Tree Grammars

In this thesis, three methods are investigated to express context-free tree languages by regular tree grammars. The first method is a characterization. We show restrictions to context-free tree grammars such that, for each restricted context-free tree grammar, a regular tree grammar can be constructed that induces the same tree language. The other two methods are approximations. An arbitrary co...

متن کامل

Gapped Suffix Arrays: a New Index Structure for Fast Approximate Matching

Approximate searching using an index is an important application in many fields. In this paper we introduce a new data structure called the gapped suffix array for approximate searching in the Hamming distance model. Building on the well known filtration approach for approximate searching, the use of the gapped suffix array can improve search speed by avoiding the merging of position lists.

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Nord. J. Comput.

دوره 11  شماره 

صفحات  -

تاریخ انتشار 2003